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ABSTRACT 

Sequential  quadratic  programming  methods  as  developed  by  Wilson,  Han,  and  Powell  have 
gained  considerable  attention  in  the  last  few  years  mainly  because  of  their  outstanding  numeri¬ 
cal  performance.  Although  the  theoretical  convergence  aspects  of  this  method  and  its  various 
modifications  have  been  investigated  in  the  literature,  there  still  remain  some  open  questions 
which  will  be  treated  in  this  paper.  The  convergence  theory  to  be  presented,  takes  into  account 
the  additional  variable  introduced  in  the  quadratic  programming  subproblem  to  avoid  inconsis¬ 
tency,  the  one-dimensional  minimisation  procedure,  and,  in  particular,  an  "active  set*  strategy 
to  avoid  the  recalculation  of  unnecessary  gradients.  This  paper  also  contains  a  detailed  mathe¬ 
matical  description  of  a  nonlinear  programming  algorithm  which  has  been  implemented  by  the 
author. ^The  usage  of  the  code  and  detailed  numerical  test  results  are  presented  in  [15]. 

- \ - 


This  research  was  supported  by  the  Deutsche  Forschungsgemeinschaft  while  the  author  was 
visiting  the  Systems  Optimisation  Laboratory. 


Introduction 


1.  Introduction 

Consider  the  general  nonlinear  optimisation  problem 

minimise  /(*) 

*6*B:  fc(*)  =  0,  j  =  (1) 

>*n.  +  l,.  ...m, 

with  continuously  differentiable  functions  /  and  gj,  y  =  1, . . . ,  m.  One  of  the  most  effective  tools 
available  today  for  solving  (1),  is  the  sequential  quadratic  programming  algorithm  as  developed 
by  Wilson  [16],  Han  [6],  and  Powell  [10].  In  this  method,  a  line  search  is  performed  along  a 
search  direction  obtained  by  solving  a  quadratic  programming  subproblem.  The  algorithmic  (see 
Han  [5],  Powell  [11])  and  numerical  (see  [12])  behaviour  of  the  method  have  been  examined  and 
various  modifications  have  been  proposed  to  overcome  certain  difficulties.  For  example,  the  line 
search  procedure  may  impede  superlinear  convergence  (see  Maratos  [9]),  and  the  algorithm  may 
cycle  (see  Chamberlain  [2]).  One  possible  remedy  (see  [13])  is  to  replace  the  non~differentiable  Li- 
line  search  function  used  by  Han  and  Powell  by  a  differentiable  augmented  Lagrange  function. 
However,  the  convergence  analysis  of  the  original  method  and  of  the  above  modification  is  based 
on  some  assumptions  which  are  often  not  satisfied  in  practice  and  there  are  a  few  additional 
numerical  drawbacks: 

1.  All  convergence  proofs  known  so  far  to  the  author,  assume  that  every  quadratic  subproblem  is 
feasible.  However,  this  assumption  is  not  always  satisfied  and  Powell  [10]  proposed  the  introduc¬ 
tion  of  an  additional  variable  in  the  subprobiem  to  guarantee  consistency.  It  will  be  shown  that 
the  resulting  algorithm  will  converge  if  the  corresponding  penalty  parameter  is  sufficiently  large. 
A  lower  bound  for  the  choice  of  this  penalty  parameter  is  given. 

2.  The  convergence  proof  of  Han  [6]  is  based  on  an  Armijo-type  line  search  procedure.  However, 
this  could  lead  to  an  inefficient  algorithm  and  Powell  [10]  proposed  a  combination  of  the  Armijo- 
type  line  search  with  a  quadratic  approximation.  This  modification  leads  to  a  slight  alteration  of 
the  existing  convergence  proof. 

3.  A  numerical  drawback  of  the  method  of  Wilson,  Han,  and  Powell  is  the  unnecessary  calculation 
of  the  gradients  of  constraints  which  are  inactive  at  the  optimal  solution.  The  experimental 
tests  of  [14,15]  show  that  an  "active  set"  strategy  can  lead  to  a  considerable  saving  of  gradient 
calculations.  It  remains  to  be  seen  whether  it  is  possible  to  prove  the  convergence  of  the  resulting 
algorithm. 

4.  The  augmented  Lagrange  function  defined  in  [13]  for  the  line  search  calculation  uses  one 
monotone  increasing  penalty  parameter  for  all  constraints.  To  improve  the  robustness  of  the  al¬ 
gorithm,  the  penalty  parameters  are  now  chosen  individually  for  each  constraint,  their  calculation 
is  simplified,  and  they  are  allowed  to  decrease  at  the  beginning  of  the  algorithm. 

5.  Any  sequential  quadratic  programming  algorithm  will  have  difficulties  in  finding  a  suitable 
descent  direction  for  the  line  search  function,  if  the  quadratic  subproblem  does  not  satisfy 
a  constraint  qualification.  A  remedy  will  be  proposed  in  this  paper  based  on  an  augmented 
Lagrangian  type  search  direction. 

Point  3  mentioned  above,  is  of  special  importance.  One  of  the  basic  open  questions  in  non¬ 
linear  programming  is  whether  an  active  set  strategy  leading  to  equality  constrained  subproblems, 
will  be  superior  to  a  sequential  quadratic  programming  algorithm  with  inequality  constraints,  or 
vice  versa.  It  is  likely  that  only  a  combination  of  both  approaches  will  lead  to  an  efficient,  robust, 
and  generally  applicable  algorithm,  and  one  could  consider  the  proposed  "active  set”  modification 
as  a  first  approach  in  finding  a  suitable  compromise. 


m 


2  Coarvgvtee  of  a  sequent ial  quadratic  programming  method 

In  Section  2  of  the  paper,  the  augmented  Lagrangian  line  search  function  and  the  quadratic 
snbproblem  are  defined.  The  algorithm  is  outlined  in  Section  3  together  with  some  implementation 
remarks.  Section  4  contains  the  global  convergence  analysis  and  farther  remarks  are  given  in 
Section  S. 


Buk  concepts 


t.  Bask  concept* 

An  important  tool  in  nonlinear  programming  is  the  Lagrange  function 

L(a,  u)  =  /(*)  —  (2) 

j-i 

with  x  £  9t",  u  =  (t<i, ....  um)T  £  Sm,  which  is  involved  in  the  well-known  necessary  optimality 
conditions,  i.e.  the  Kuhn-Tucker  conditions  for  problem  (1) 

a)  V«L(z,  u)  =  0, 

b)  fj(*)  =  °,  y=i, 

c)  »,(*)>  0,  j  =  m.  +  (3) 

d)  «,  >  0,  j  =  m,  +  l, 

e)  =  0,  j  =  mt  +  1, . . . ,  m. 

Here,  V.  denotes  differentiation  with  respect  to  the  x-variables.  A  sequential  quadratic 
programming  algorithm  proceeds  from  a  quadratic  approximation  of  the  Lagrange  function  (2) 
and  a  linearisation  of  the  constraints.  If  xk  denotes  the  Jb-th  estimate  for  the  optimal  solution  and 
Bk  a  symmetric  matrix  that  approximates  the  Hessian  of  the  Lagrange  function,  the  resulting 
quadratic  programming  subproblem  can  be  written  in  the  form 

minimise  ^ dTBki  -f  V/(z*)rd 

d£*n:  VpJ-(x*)Td  fy(zfc)  =  0,  j  =  (4) 

Vgi(x*)rd  -f  ff(xk)  >0,  j  =  m,  +  1, . . . ,  m. 

The  next  iterate  is  given  by 

**+i  =  *k  +  okik, 

where  4*  denotes  the  solution  of  (4)  and  a*  a  steplength  parameter  which  will  be  discussed  later. 
A  numerical  drawback  of  using  (4)  is  that  all  gradients  of  the  constraints  must  be  evaluated  in 
each  iteration  step,  even  if  z*  is  close  to  the  solution  and  we  can  suppose  therefore  that  the 
calculation  of  inactive  nonlinear  constraints  is  unnecessary.  This  statement  is  at  least  true  if 
we  expect  that  nonlinear  constraints  inactive  at  the  optimal  solution,  correspond  to  linearised 
constraints  inactive  at  a  solution  of  (4).  To  avoid  this  situation  and  to  improve  the  efficiency  of 
the  algorithm,  an  alternative  subproblem  may  be  defined  as  follows: 

minimise  £ dTBkd  -f  V/(x*)T4 

4  6*-*:  Vgy(x»)r4  +  W(z*){£}0,  j  £  J*k,  (5) 

VlA*hV))Td  +  gj(*k)  >0,  /  £  K*k. 


j\  and  K\  are  two  disjoint  index  sets  with  J\ U if*  =  {l,...,m}.  j\  is  called  the  set 
of  the  active  constraints  including  the  equality  constraints,  and  Kk  is  the  set  of  the  inactive 
constraints.  The  indices  k(J)  <  k  correspond  to  previous  iterates  and  their  definition  will  be 
clear  when  investigating  the  algorithm. 

To  motivate  the  choice  of  the  active  set  J*k,  we  observe  that  the  algorithm  approximates 
not  only  the  optimal  solution  by  x*,  but  also  the  optimal  Lagrange  multipliers.  The  variables 
corresponding  to  the  Lagrange  multipliers,  are  denoted  by  v*  =  («£*), . . . ,  »W)r.  A  constraint  is 


’“v  »  <*,  MOW.1 
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called  active,  i.e.  its  index  is  in  J\,  if  its  function  value  is  not  positive  or  if  tlie  corresponding 
multiplier  is  greater  than  sero.  Given  a  constant  c  >  0  and  any  iterates  xk  and  0*1  we  set 

J\  —  {1,  ••«»»«}  U O’: fj(xk)  <  e  or  vik)  >  0}, 

=  \ 

If  xk  is  feasible,  vk  is  replaced  by  the  optimal  Lagrange  multiplier  of  (1),  and  e  sufficiently 
small,  then  Jk  defines  the  constraints  which  are  active  at  the  optimal  solution  of  (1).  By  using 
the  condition  g3{xk)  <  e  instead  of  gj(xk)  <  0,  we  attempt  to  avoid  the  situation  in  which  9j{xk) 
tends  to  zero  for  j  €  Kk. 

However,  the  linear  constraints  in  (4)  or  (5)  can  become  inconsistent  even  if  we  assume  that 
the  original  problem  (1)  is  solvable.  As  in  Powell  [10],  an  additional  variable  S  is  introduced  in 
(5),  leading  to  an  (n  -f  l)-dimensional  subproblem  with  consistent  constraints: 

minimise  \&Bkd  +  V/(x*)rd  -f* 

destve*:  vw(*k)rd+(i-o^(**H^}0,  jeJ*k, 

^ ijixh(j))Td  +  9j{xk)  >0,  j  6  K*k, 

0  <  6  <  1. 


Obviously,  the  point  do  =  0,  60  =  1  satisfies  the  constraints  of  (7),  since  }j[xk)  >  0  for 
all  j  6  K*k-  We  conclude  that  (7)  has  a  finite  unique  solution  provided  that  the  matrix  Bk  is 
positive-definite.  The  additional  penalty  parameter  pk  can  be  chosen  by 


Pk 


to  » 


(1  -  M-iBh-idk-J 


(8) 


for  k  >  0  and  a  constant  p*  >  1.  A  motivation  for  this  rule  is  given  by  the  convergence  analysis 
of  Section  4.  Here,  A*_i  denotes  the  matrix 


Now  assume  that  we  have  succeeded  in  solving  (7),  giving  us  a  search  direction  ik  and  a 
multiplier  uk  =  (u[k\...,u$)T.  As  mentioned  above,  the  variables  and  the  multipliers  are 
updated  simultaneously  by 


**+1  —  **  +  okdk,  t»*+i  —  vk  +  Ok(uk  —  Vh)- 


The  steplength  parameter  a*  is  obtained  by  minimising  a  line  search  function  or  merit 
function.  Han  [6]  and  Powell  [10]  used  the  non-differentiable  L\-  penalty  function 

pi*> r)  —  /(*)  +  ri  I  I  +  H  Ti  I  I  W 

jf-i  j-m.+r 

with  r  =  (rj, . . . ,  rm)T .  The  use  of  (0)  alone  can  lead  to  two  difficulties.  First,  the  superlinear 
convergence  may  be  impeded  even  in  an  arbitrarily  small  neighbourhood  of  the  solution  (see 
Maratos  [0]).  Second,  the  algorithm  may  cycle  if  the  penalty  parameters  are  chosen  improperly 
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s 


I 
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(see  Chamberlain  [2]).  To  overcome  these  situations,  (9)  has  been  replaced  by  the  differentiable 
augmented  Lagrange  function  in  [13],  i.e.  by 


M*.»)  —  /(*)  ~  J2(vj9j(*)  ~  i  W(x)a)  -  i  £  vVri- 

jeJ  je#r 

Here,  the  index  sets  J  and  K  are  defined  by 

J  —  {1,  .  •  •,»»«>  U  {j  :  m,  <  j  <  m,  g3{z)  <  ey/r,}, 
K  =  \  J. 


(10) 


(U) 


However,  we  must  be  very  careful  when  replacing  (9)  by  (10)  in  an  optimisation  algorithm. 
The  difficulty  arises  that  a  solution  of  (1)  is  only  a  saddle  point  of  the  function  j,  with  respect 
to  the  variables  (z,  v).  In  other  words,  a  formulation  of  an  optimisation  algorithm  as  a  descent 
method  for  4>r  with  a  constant  penalty  parameter  r  can  lead  to  a  sequence  (zk,  vk)  that  tends  to 
infinity,  even  if  the  feasible  region  of  (1)  is  compact.  To  avoid  this  undesirable  behavior  of  the 
algorithm,  the  penalty  parameter  in  (10)  must  be  adapted  in  an  appropriate  way  and  is  defined 
by 

r(*+l>  =  max(«rfr$* 


*  (1  -  Sk)dlBkdk  )'  t-1""’”1’  (12) 

r£+1V, 


where  (dk,  uk)  is  a  Kuhn-Tucker  point  of  the  quadratic  subproblem,  6k  the  additional  variable  from 
(7)  (£*  1),  r*  the  previous  penalty  parameter,  the  current  approximation  of  the  multiplier 

vector,  and  Bk  a  positive-definite  approximation  of  the  Hessian  of  the  Lagrange  function.  The 
function 

ffr.\  t  A.  \\ 

(13) 


— ^(nM(*:)+ ■ ».))• 


with 


Jk  =  {1, . . . ,  m,}  U  O :  »»«</<  m,  g3(zk)  < 
Kk  —  {1, •  \  Jk, 


-(14) 


can  be  minimised  with  respect  to  a,  leading  to  a  steplength  a*.  We  must  distinguish  between 
the  index  sets  Jk,  cf.  (7),  and  Jk,  cf.  (14),  which  are  both  approximations  of  the  optimal  active 
set  of  (1).  It  is  easy  to  see  that 

^24  K*kCKk.  (15) 

The  sequence  {o^}  is  included  in  (12)  to  allow  for  decreasing  penalty  parameters  at  least 
in  the  beginning  of  the  algorithm,  if  we  require  that  crW  <  1.  On  the  other  hand,  it  should 

guarantee  the  convergence  of  {r^}  whenever  this  sequence  is  bounded.  A  sufficient  condition  is 
given  by  the  following  lemma. 

(2.1)Lemmat  Assume  that  i*  bounded,  crW  <  1  for  ell  k,  and  that 


£<!-*?>)  < 


*— 0 


00, 


!<;'<». 


Then  there  is  a  r*  >  0  with 


lim  = 

k-»  oo  • 


Ti ■ 


( 


e 
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Pro aft  To  simplify  the  notation,  we  omit  the  index  j  and  define  R  as  the  upper  bound  of  the 
penalty  parameters.  Assume  that  there  are  two  different  accumulation  points  r*  and  r**  of  {r(*)} 
with  r*  <  r**.  Then  we  obtain  for  each  «  >  0  infinitely  many  k  and  Ik  with 

|  ,■(*+**)  _  r*  j<  £|  |  r(*)  _  r**  |<  e> 


Setting  l  ==  Ik  and  choosing  a  sufficiently  small  c  >  0,  we  obtain 

0  <  r**  -  r*  -  2e 
<  _(r(*+Q  _  r(*)) 

i—i 

iwmO 

Since  this  inequality  is  valid  for  infinitely  many  k  and  the  right-hand  side  tends  to  sero,  we  get  a 
contradiction.  | 

A  possible  choice  of  oW  could  be 


Since  we  expect  that  only  large  penalty  parameters  could  affect  the  performance  of  the  algorithm, 
we  can  replace  the  above  formula  by  a  simpler  approximation 


s 

t 


-  -V  St.JtM*<^-U 


Tbe  algorithm  T 

8.  Tlw  algorithm 

Now  we  are  able  to  formulate  the  algorithm.  First,  tome  constants  e,  0,  ft,  S,  f  have  to  be 
chosen  that  are  not  changed  within  the  algorithm  and  that  satisfy 

e>0,  0  <  0  <1,  0  <  ft  <  %,  0  <  $  <  l,  p>\.  (17) 

The  main  steps  consist  of  the  following  instructions: 

(3.1)  Algorithm  : 

0)  Start:  Choose  some  starting  values  z0  €  S'*,  t»o  €  Rm,  Do  £  Rn  X  S’*  positive-definite,  p0  6  S, 
r0  £  »m,  and  evaluate  f(z0),  9j(xo),  j  =  l,...,m,  V/(z0),  Vjy(zo),  j  =  Determine 

Jo  and  let  k(j)  =  0  tor  all  j  £  K*0. 

For  k  =  0, 1,2, . . .  compute  Zk+i,  Vk+i,  J3*+i,  rk+i,  pk+i,  end  J^+1  as  follows: 

1)  Solve  the  quadratic  subproblem  (7)  and  denote  by  dk,  6k  the  optimal  solution  and  by  ttk  the 
optimal  multiplier.  If  6k  >  6,  let  pk  ==  Jpk  and  solve  (7)  again.  U  this  loop  fails  within  a  given 
upper  bound  for  6k,  define 

d*  =  Bk  VI^rt(z*,  Vk),  /jg» 

uk  =  »*—  V,^(**,»jk). 

2)  Determine  the  new  penalty  parameter  by  (12)  and  (16).  If  d*  ,  u*  have  been  obtained  by 

(18),  let  r*+1  =  r*. 

3)  If  pk'(0)  >  0,  let  px  =  ppk  and  go  to  1). 

4)  Define  the  new  penalty  parameter  Pk+i  by  (8). 

5)  Perform  a  line  search  with  respect  to  the  function  Pk(<*)  defined  by  (13).  Let  ajt.o  ==  1  and, 
for  i—  1,2,...,  let  ik  be  the  first  index  for  which 

<Pk{(*k,i)  <  1P*(0)  +  (1®) 

where  a*,,-  =  max(0ak,i~i,  tf  jt,< — i).  Here,  &k,i-i  «  the  minimiser  of  a  quadratic  approximation 
of  Pk{a)  using  <pk(0),  <Pk'{0),  and  p*(a*,,_i).  Define 


6)  Let 


a*  =  (*k,ih • 


**+i  ~  **  +  a*d*, 

»k+i  =  vk  +  ak(u*  —  wk), 


and  evaluate  f(zk+i),  9,(zk+ 1),  j  =  V/(*k+ 1),  ^k+i  b/  (6h  and  Vy,(zk+i),  J  € 

7* 

7)  Compute  a  suitable  new  positive-definite  approximation  of  tbe  Hessian  of  the  Lagrange  func¬ 
tion,  i.e.  Bk+i,  set  k  =  k  +  1,  and  repeat  the  iteration. 

The  following  remarks  will  illustrate  further  details  of  the  algorithm.  Numerical  experience 
shows  that  the  definition  of  the  parameters  satisfying  (17)  is  not  a  crucial  part  for  the  performance. 
Suitable  values  are 

e  =  10_T,  0  =  .1,  #i=l,  f=-9,  P=  100. 

A  stopping  criterion  has  been  omitted  to  facilitate  the  description  of  the  algorithm.  For  a 
suitable  condition,  one  could  use  Powell’s  [10}  proposal  or  any  other  rules,  for  example 

dlBuik  <  ea, 
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53  I  uijk)9i(Xk)  I  <  f. 

j=l 

||  V*L(x*,  u*)  |12  <  £, 

fflt  ffi 

53  I  ?;(**)  1+53  I  min(°*  »(**))  I  <  v^- 

i=l  J  =  m,+  1 

The  corresponding  tolerance  e,  which  must  be  provided  by  the  user,  could  also  be  applied  to 
define  the  active  set  J*k,  cf.  (6).  Here,  e  should  be  sufficiently  small  so  that 

o<£<  g3{z*) 

for  all  j  >  me  with  g3{x*)  >  0  and  for  the  optimal  solution  z*.  If  t  is  chosen  too  large,  the  only 
disadvantage  is  that  some  additional  gradient  evaluations  are  required. 

A  user  often  has  a  suitable  guess  for  the  starting  point  zq.  If  nothing  is  known  about  the 
multiplier  and  the  Hessian  of  the  Lagrangian,  one  could  define  t/o  —  0,  So  —  I,  and  one  could 
set  p0  =  1,  =  1,  ji  =  1, or  even  larger,  if  a  numerically  stable  algorithm  for  solving 

the  quadratic  subproblem  is  available. 

Numerical  tests  indicate  that  the  penalty  parameter  p*  in  (7)  could  influence  the  performance 
of  Algorithm  (3.1).  For  this  reason,  the  numerical  implementation  [15]  contains  an  additional 
option  to  solve  the  quadratic  subproblem  (5)  first,  and  to  proceed  to  (7)  only  if  (5)  turns  out  to  be 
infeasible.  Note  that  the  convergence  results  of  Section  4  remain  valid  if  this  option  is  preferred. 
Furthermore,  the  choice  of  pk  by  (8)  is  adapted  to  the  current  state  of  the  algorithm  to  avoid 
unnecessarily  ill-conditioned  matrices  of  the  form 

(Bk  °) 

Vo  pk J 


in  the  quadratic  programming  routine. 

The  loop  in  Step  1)  of  Algorithm  (3.1)  could  fail  only  if  the  subproblem  does  not  satisfy  a 
constraint  qualification.  In  this  case,  the  modified  search  direction  (18)  is  used  with  the  intention 
of  minimizing  the  augmented  Lagrange  function  $rk.  The  loop  between  Step  3)  and  Step  1)  is 
finite,  since  a  lower  bound  for  the  choice  of  pk  can  be  given,  cf.  Section  4. 

When  solving  the  subproblem  (7)  by  any  “black  box*  quadratic  programming  subroutine, 
one  overlooks  the  fact  that  in  a  quasi-Newton  implementation,  the  matrix  Bk  is  updated  by  only 
a  rank-two  correction.  To  improve  the  numerical  efficiency  of  the  algorithm,  in  particular,  if  only 
a  few  constraints  are  active,  one  could  use  a  Cholesky  factorisation  of  B*.  For  a  description 
of  the  corresponding  LBL-factors  see  [14].  Then  the  quadratic  subproblem  is  identical  with  a 
least-squares  problem  which  could,  for  example,  be  solved  by  the  programs  published  in  Lawson 
and  Hanson  [8]. 

The  definition  of  the  penalty  parameter  r*+1  is  closely  related  to  the  algorithm  presented  in 
[13].  However,  there  are  two  differences.  First,  the  penalty  parameters  are  chosen  individually 
for  each  constraint,  and  second,  bounded  parameters  are  not  expected  to  be  constant  as  in  [13], 
so  that  the  resulting  algorithm  should  be  more  efficient  and  robust.  Nevertheless  it  is  possible 
that  tends  to  infinity.  The  convergence  analysis  of  Section  4  will  show  that  in  this  case,  the 
convergence  of  Algorithm  (3.1)  can  be  proved  without  using  a  line  search,  which  indicates  that  this 
situation  should  occur  rarely  in  practical  situations.  The  specific  choice  of  is  motivated  by 


The  algorithm 
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the  convergence  requirement  to  generate  descent  directions  for  the  augmented  Lagrange  function 
jrk+1-  The  parameter  r*-f.  i  will  be  large  when  the  improvement  in  the  approximation  of  the 
variables,  i.e.  d%,  will  be  smaller  than  the  improvement  in  the  approximation  of  the  multipliers, 
i.e.  Uk  —  Vk- 

The  line  search  procedure  of  Step  5)  in  Algorithm  (3.1)  is  a  very  simple  method  and  is  justified 
by  the  excellent  numerical  results  obtained  with  the  original  implementation  of  Powell,  cf.  [12], 
and  in  further  tests,  cf.  [14,15].  It  is  expected  that  only  for  badly  scaled  problems,  this  procedure 
should  be  replaced  by  a  more  sophisticated  algorithm,  cf.  for  example  Gill,  Murray,  and  Wright 
[3].  A  straightforward  classroom  calculation  shows  that  the  quadratic  approximation  of  y?*(a)  is 
minimized  by  the  expression 


. _ ^.iPk’jO) _ 

k,t  *  ak,i<Pk'(0)  ~  ( <Pk(ak,i )  ~  ¥>k(0)) 


(20) 


with 

**'(0  )  =  V*r^{Xk‘Vk)T{Ukd-vk} 

It  will  be  shown  in  Section  4  that  <  0,  and  that  the  line  search  algorithm  is  finite. 

When  investigating  Step  6)  of  Algorithm  (3.1),  the  choice  of  the  variables  z*(jj  in  (6)  can  be 
explained.  In  the  matrix  defining  the  linear  constraints  of  the  subproblem,  only  those  rows  are 
replaced  in  the  fc-th  iteration  step,  for  which  j  e  /;+!.  The  others  remain  as  the  previously 
computed  gradients. 

Finally,  a  suitable  approximation  of  the  Hessian  of  the  Lagrangian  must  be  found.  The  ex¬ 
tensive  numerical  experience  gathered  in  recent  years  shows  that  this  Hessian  can  be  approximated 
by  a  variable  metric  formula  with  positive-definite  matrices  5*,  even  if  the  true  Hessian  of  the 
Lagrange  function  is  indefinite.  Since  excellent  numerical  results  are  obtained  with  Powell’s 
modification  of  the  BFGS-formuIa,  cf.  [12],  the  usage  of  this  formula  or  its  equivalent  inverse 
formulation,  if  one  wants  to  avoid  the  inversion  of  triangular  factors,  is  recommended.  For  more 
information  about  this  variable  metric  formula,  see  Powell  [10]  or  [14]  for  the  definition  of  the 
corresponding  Z,Z>L-factors. 
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Conrergence  of  a  sequent ial  quadratic  programming  method 


4.  Global  convergence  analysis 

The  convergence  analysis  of  Algorithm  (3.1)  depends  mainly  on  the  Kuhn- Tucker  conditions 
for  the  quadratic  programming  subproblem  (7)  which  can  be  written  in  the  following  form: 


a)  Bkdk  +  V/(z*)  —  53  u5k)Vfc0t*)—  53  =  °> 

*=/;  ierc; 

b)  pkh  +  53  ~  vik)  +  u2k)  =  °* 

ie/; 


c) 

««=0, 

j  —  1,  •  •  • , 

d) 

w{k)  >  0, 

j  —  m«  +  1, . . . ,  m, 

e) 

0  <  sk  < 

1, 

0 

uV  >  0, 

y=m. +  l,...,m, 

g) 

i/[k)  >  0, 

b) 

J/W  >  0, 

i)  w(*)ttW=o,  i  =  1, . . m, 

j)  »/<*>«*  =  0, 

k)  *4*>(  1  -  tfk)  =  0, 


where 


*;k) = + a  -  j  e 

=  Vyy(i*(J))rd*  4-  yy(**),  j  £  Kk. 


i/W  and  i/^  are  the  multipliers  with  respect  to  the  lower  and  upper  bounds  for  the  additional 
variable  6. 

First  we  have  to  investigate  whether  Algorithm  (3.1)  is  well  defined  and  start  with  considering 
the  internal  loop  of  Step  1). 


(4.1)  Lemma:  Assume  that  (7)  satisfies  the  constraint  qualification,  i.e.  that  the  gradients  of  the 
constraints  active  at  the  optimal  solution  are  independent,  and  that  the  feasible  region  of  (7)  is 
bounded  for  each  k.  Then  the  loop  in  Step  1)  of  Algorithm  (3.1)  is  finite. 

Pro  oft  To  simplify  the  proof,  we  omit  the  iteration  index  k  and  assume  that  there  are  infinitely 
many  p,  with  lim,_0o  p,  —  oo,  each  giving  a  solution  d,-,  6,  of  (7)  and  a  multiplier  u,.  Since 
6i$  >  0,  we  obtain  from  (21b) 


o  >  pa  +  53  “SPfdto  ^  +  53  “! 

ie/*  jeJ* 


indicating  that  lim,_00  ||  u,  ||—  oo.  If  fl<  denotes  the  non-zero  part  of  u,-  and  At  the  matrix 
consisting  of  the  corresponding  gradients  Vyy,  we  write  (21a)  in  the  form 

Bdi  -J-  V/(jj)  —  =  0 


or 

Qi  =  (A,TA<)-1A,7'(fld,  +  V/(x)). 
Since,  however,  (d,)  is  bounded,  we  obtain  a  contradiction.  | 


l 
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The  boundedness  of  the  feasible  region  in  (7)  will  henceforth  be  assumed  now  in  the  farther 
global  convergence  analysis.  The  iterates  x*  can  be  forced  to  remain  bounded,  if  additional  lower 
and  upper  bounds  are  given  in  (1),  i.e.  if  there  are  xt,  z,  6  8*  with 

*i  <  x  <  x%,  (23) 

also  implying  the  boundedness  of  {d*}  provided  that  {a*}  does  not  approximate  sero. 

The  subsequent  theorem  will  be  fundamental  for  the  convergence  analysis.  It  shows  that 
the  computed  search  direction  is  a  descent  direction  for  dr*+1,  i.e.  that  the  line  search  is  well 
defined,  and  that  it  leads  to  a  sufficiently  large  decrease  of  First,  *ome  notation  will  be 

introduced  to  facilitate  the  proof.  If  X*,  vk,  /*  are  some  iterates  of  Algorithm  (3.1)  and  rk+ x  the 
corresponding  penalty  parameter,  we  set 


dk  =  (o[ 

II 

»f>.  if  ye/*, 

0,  otherwise 

®*  =  (*(l*)....,®&))r, 

*yk)  =  { 

^ ye/*, 

0,  otherwise  , 

9k  —  (yi(*Jk),---.Jm(**))T, 

y 

f*  =  (fl  (**),••,  !m(*k))T, 

iMk)  = 

ffj(*k).  if  ye/*, 

(  0,  otherwise  , 

(24) 

/*  =  &(**),. .-X(**))r, 

$>(**)  = 

f  9ii*k),  if  ye/*, 

(  u(*tyr(*J,  otherwise  , 

A*  =  (Vji(xfc),  ..,  Vgm(xO), 

ft*+1=diag(ri*+1),...,r(J|+1>) 


Then  we  can  express  the  gradient  of  4rh+1(zk,  »*)  in  the  following  form: 

=  (*/(«>-*<«*  -  R‘+M 


(4.2)  Theorem:  Let  x*,  vk,  d*,  Bk,  u*,  Bk,  rk,  pk,  and  J*k  be  given  iterates  of  Algorithm  (3.1), 
k  >  0,  and  assume  that 


(0 

(ii) 

(Hi) 


dlBkdk  >  1  ||d* 

&k  < 


11  A*g*J 
7(1  —  *)a  ’ 


!la 


for  a  7  6  8  with  0  <  7  <  1, 


V^+1(x*,0k)r(Ut^wJ  <  -i7  II  dk  ||a  .  (26) 

Pro  oft  We  use  the  Kuhn- Tucker  conditions  (21)  tor  the  quadratic  subproblem  and  get  (26)  by  the 
following  estimates,  where  we  omit  the  iteration  index  k: 
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Convergence  of  a  tequential  quadratic  programming  method 


=  -Vf(*)Td  +  dTA[(J  -  Rf)  +  J*T(u  -  v) 

=  dTBd  —  53  «yV?/(*)r<*-  E  **y V9j{*m)Ti 

ye/*  ye/c* 

+  E^  ~  r/ ?/(*))  Vjy(*)Td  +  53  »(*)(“/  —  »/) 

ye/  ye/ 


+  X)  rVj(Ui  _  ^ 

,7=*  ri 


ye/c 

=  dTBd—  53  (“j  —  wy)^ffy(*)r^ —  E  w(*k(j))Td 

ye/*  ye/c* 

-  E  «'/V?y(z)7’<i  -  E  »’/fy(*)Vfy(*)7’‘* 

ye/*\/  ye/ 

+  E(“y -  wy)«»(*)  +  E  ~  v>) 

jeJ  ye/c  •» 

=  drBd  -  53  («y  -  »y)»y  +  (1  —  *)  E  («y —  VMX) 
ye/*  ye/* 

-  53  iw  +  E  uy*y(*)  ~  E  t,y*y 

ye/c*  ye/c*  ye/*\/ 

+  (i  -  V  E  wyM*)  -  E  rw(*)wi  +  (i  -  6)  E  fyfy(*)a 

ye/*\/  ye/  ye/ 

+  E(“/ _  wy)fe(*)  +  E  r^(u/' _  •*) 

ye/  ye/c r/ 

=  drBd + 53  «y(»y  -  'Mi*))  +  E  ®ywy  + 2  E(“> _  VM*) 

ye/  ye/*\/  ye/ 

+  E  (*y —  vj)fj(z) ~6  E (“y - *'y)?y(*)  +  E  “y*y(*) 

ye/*\/  ye/*  ye/c* 

-  E  *'/■“'/  +  (i  —  0  E  wy?y(*)  +  (!  - *)  E  r/i «(*)a 

ye/*\/  ye/*\/  ye/ 


+  E  rWj(uy  _ 

ye/c r/ 

>  dTBd  +  2  53(U>  -  Vj)9i(z)  +  2  53  Ittfuy  -  „,) 

ye/  ye/c  / 

-  E  -M“y  -  *y)  +  E  **/«(*)  - *  E  (“y  —  ®y)»y(*) 


ye/c 


ye/*\/ 


ye/* 


+  E  “yfy(*)  +  (i  —  0  E  ry*y(*)3 

ye/c*  ye/ 

-*  E  #) 

ye/*\/ 


cf.  (21a), (24) 


ef.(lS) 


cf.(22) 


cf.(21i) 


ef.(14),(21d) 


Global  counrguica 


f 

\ 

f 


I 


=  JrSd  +  2^(«-,)-  £  2 

J€AT*  *  i€K\IC*ri 

+  52  ui9ji*)  —  6  52  “/»>(*)  —  1 52(ui  —  »y)«ly(*) 
ye*\**  ye/*\/  ye/ 

+  52  uj9j(*) + (i — s)  52  »vfc(*)a 

ye/ 


Sex* 


<*r5i  +  2f,r(u  -  V)  +  53  «*y(fy(x)  -  -t>y)  +  J2  -tf? 

ye**  ri  ye#r*  fJ 

+  E  “»w*)-^)+  E  ~vf - *  E  */*(*) 

)£K\K*  3  SeX\X*  3 


&X\X*  * 3  jelC\K* 

-  tf  J2(«y  -  Vj)tj(x)  +  (1  -  S)  X2  »-yfy(*)a 

ie/  ye/ 

=  dTBd  +  2/r(u  - 1>)  +  (1  -  +  £  «y(*y(x)  -  -l>y) 

ye*  fy 

+  E  ~vs  ~  6  E  uyfy(*)  - 6  E(“y  ~  wy)fy(*) 


ye/ 


>  dTBd  +  2/r(«  -  w)  +  (l  -  6)9tR9  +  (i  -  s)  52  -vf 

sex  ri 

~ 6  E  uy?y(*)  +  *  E  wy«y(*) 

ye/*  ye/ 

>  drfl<f  +  2y,r(u  -  v)  +  (1  -  i)*,rK*'  +  ,«a 
—  vx6  +  SVTf  —  -L-cT0 

=  dTBd+  ||  Vl  -  SR'/tf  + l R-i/»( u  -  „)  ||8 

VT=S 

~  i~7(“  ~  »)T#-1(«  —  v)  +  pt2 
+  tf(0rJ  — 

>  dTBd—  YZSg  (w  —  *Orrt-l(u  —  v) 

+  (V*  +  - 1(^1  - 

>  Jrfrflrf  +  \dTBd  —  -L  -(u  —  »)ri?~*(u  —  v ) 


IS 


cf.(15) 


cf.(21f),(14) 

cf.(21b,d) 

cf.(21j) 
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Convergence  of  a  sequential  quadratic  programming  method 


>  ji  m  f  +iJrsi-  if;  i(»,  - »,)» 

_^(<'T(,_r^7e))>  cf(i> 

>  m  »*  -  -Lj  |  -  »,)’ 

cf(12) 

=  i7  M  II2  -v(fLfy(Pr^a  cf  (22) 

>t-lNII‘-<(1*~^i,a  IHMW  am 

>h\\d  ||a  .  Cf.(i),(ii) 


During  the  proof  we  used  vW  >  0  for  nil  j  >  m„  since  Qk—i  <  1,  and  we  set 

RlJ*  =  diag(  VfT, ....  y/r^)- 

■ 

Assumption  (i)  can  be  considered  as  a  standard  assumption  henceforth  required  in  the  theory 
of  quasi-Newton  algorithms.  It  can  be  forced  by  choosing  a  7  and  performing  a  restart  with 
Bk  =  I  whenever  (i)  is  violated.  The  validity  of  assumption  (ii)  is  guaranteed  by  Step  1)  of 
Algorithm  (3.1),  since  Lemma  (4.1)  shows  that  after  finitely  many  sub-iterations,  the  condition 
Sa  <  6  will  be  achieved  at  least  under  a  constraint  qualification.  Otherwise,  dk  and  uk  define 
a  descent  direction  for  1  in  the  case  when  they  are  replaced  by  (18).  To  avoid  expensive 
calculations  for  obtaining  the  lower  bound  (iii)  of  the  penalty  parameter,  pk  i*  defined  by  (8), 
since 

dk_lAk_  iu*_ 1  =  id*— 1  +  dk_jV/(xk_i).  (27) 

and  all  inner  products  are  previously  computed  in  the  algorithm.  Furthermore,  the  lower  bound 
in  (iii)  does  not  depend  on  dk,  uk,  or  pi,,  which  implies  that  the  loop  between  Step  3)  and  Step 
1)  of  Algorithm  (3.1)  is  finite. 

(4.3)  CoroDaiyt  The  loop  between  Step  3)  and  Step  1)  of  Algorithm  (3.1)  it  Suite. 

To  show  that  the  line  search  of  Step  3)  of  Algorithm  (3.1)  defines  a  finite  sub-iteration,  we 
use  the  following  estimate  for  aki<: 

(4.4)  Lemmai  Let  k  denote  the  k-th  iteration  of  Algorithm  (3.1)  and  assume  that  pk'(0)  <  0. 
Then 

«M+i  <  (0. 2(j!_^)aM  (28) 

whenever  (19)  it  not  valid  for  some  s’  >  0. 


Global  convergence 


IS 


PmA  From  ^>*'(0)  <  0  and  the  violation  of  (19),  we  obtain 

<*h<Pk'(0) 


cf.  (20),  and 


&k,i  =  $ 

<  i 


*k,iPk'(0)  —  (<fik(°k,i)  —  <Pk(0)) 
*k,i'Pk‘{0) 

o»,<lP»f(0)  -  pak.iVk'i o) 

1 


2(1 -M) 

<*k,i+i  <  max(^^i-^)aM. 


Since  a*,<  -*  0  for  *  -+  oo  and  ^*'(0)  <  0  i*  impossible  without  violating  (19),  we  get: 

(4.5)  Corollas?:  The  line  search  procedure  of  Step  3)  of  Algorithm  (3.1)  is  Suite  provided  that 
V>k'( 0)  <  0. 

Now  we  are  able  to  prove  the  following  convergence  theorem: 

(4.6)  Theorem:  Let  Xk,  vk,  ik,  4*,  u*,  B*,  r»,  p*,  and  be  given  iterates  of  Algorithm  (3.1), 
k  >  0,  and  assume  that  there  are  positive  constants  7  and  6  with 

(i)  dlBkdk  >  7  !!  *k  i|a  for  all  k, 

(ii)  6k  <  f  for  all  k, 

on  '»  *"  *• 

(i>)  {*»>.  {d»>,  {«*},  and  {Bk}  are  bounded  . 

Then  there  exists  for  each  e  >  0,  a  k  with 

*)  II  dk  II  <  «, 

b)  ||  fl7+{2(“*-«'*)ll<  e. 

Pro  oft  First  note  that  the  boundedness  of  {ut}  implies  the  boundedness  of  {vt},  since  at  <  1 
for  all  k.  To  show  a),  let  us  assume  that  there  is  an  e  >  0  with 

INsllX  (29) 

for  all  k.  From  the  definition  of  rg+j,  k  >  0,  we  obtain  either  or 

(*+1)  2m(uf)-vP)> 

*  ~  (1  -  6k*)dl*Bk*dk* 

^  2m(up  —  i>p)2 

^  /1  t\ _ 9 


for  some  *’<*,/=  1, ...,  m,  Since  u*  and  therefore  alto  vk  are  bounded,  ire  conclude  that 
{rk}  remains  bounded  and  Lemma  (2.1)  implies  that  there  it  some  r  >  0  with 


lim  r*  =  r. 

k-*oo 


Now  consider  any  iteration  index  k.  Then 


*r*+l(*k+l,t>k+l)  < 

<  ~  i P®k7  ||  dk  ||a 

cf.  Theorem  (4.2).  Next  we  have  to  prove  that  a*  cannot  tend  to  sero.  Let  k  >0,  and 


.>=(“),  „*(  *  Y 

\VkJ  \Uk  —  VkJ 


Since  all  functions  defining  4>r  ere  continuously  differentiable,  rk+1  is  bounded,  and  xk,  pk  remain 
in  a  compact  subset  of  9tn+m,  we  can  find  a  >  0  with 

I  ^4>rk+i{zk  +  apk)TPk  —  W,4+t(**)rPk  I 

<  II  V*fM.t(*k  +  «Pk)  -  V+rh+1{zk)  llll  Pk  ||  (33) 

<  J(1  -M)7«a 

for  all  a  <  a  and  for  all  k.  Using  the  mean  value  theorem,  we  obtain  i(tG  [0, 1)  with 

*r»+1(*k  +  <*Pk)  -  *r*+1(*k)  -  (taVi>rh+i{zk)Tpk 
=  <*V<t>Tk+i{zk  +  ZkaPk)TPk  —  ftaV4>Tk+Azk)Tpk 

<  aWf*+»(**)rPk  +  i«(l  —  P)7f3  —  /*®Wr*+i(*k)rPk  cf.(33) 

<  —  ia(l  —  fi)7  ||  dk  ||a  +$<*(1  —  p)l^  cf.Th.(4.2) 

<  -a{  J(1  -  M) T*2  -  i(l  ~  P)7*a)  cf.(29) 

=  —  —  ft  be2 

<  0 

for  all  *  and  all  a  <  <5.  From  (28)  we  conclude 


<  ®k,<  < 


for  all  t  >  0,  where 


Therefore,  there  is  an  to  independent  from  k  so  that  (19)  is  satisfied  for  all  i  >  to  and  all  k.  Since 
t*  is  the  first  index  for  which  a*,<  satisfies  (19),  we  conclude  that  ak  does  not  approach  sero,  i.e. 

®k  =  *k,u  >  Pih  >  P*. 
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Together  with  (31)  we  obtain 


+fk+x(n+i)  <  #»*+.(**)  -  2* 


Now  we  consider  the  difference 


^r*+t(*S+l)  —  ^rk+,(*H- 1) 


I’eJCh+i 


+  E('S‘+0»i(*‘+>)-ir<‘+‘>»i(*»+.),)+l  £  »<*+')I/r<*+‘) 

J'€/*  J6JT* 

with 

4k+i  =  {l,...,m«}  U  {j  :  m,  <  j  <  m,  fr(xk+i)  <  t>(fc+1)/r(*+3,>, 

Jk  =  {l,...,m.}  U  {j  :  m«  <  j  <  m,  fc(**+i)  <  »(*+1)/r<*+1)}, 

and  Jf/i+i,  Kk  are  the  corresponding  complements.  Since  rk-|_i  -»  r  >  0  for  Jt  -+  oo, 
and  vt+i  are  bounded,  we  get 

—  *rt+1(*k+»)  <  ? 

for  all  sufficiently  large  k.  This  leads  to 

^*+«(*k+l)  —  ^r*+i(*k+l)  "t-  f  ^ 

cf.  (34),  for  all  sufficiently  large  k  and  to  a  contradiction,  since  {^,fc+1(zk)}  is  bounded  below. 
This  shows  statement  a).  Statement  b)  follows  from  a),  the  definition  of  rk+1,  cf.  (12),  and  the 
boundedness  of  {flk}: 

i  («.  — » f  -  s 

i- 1  Ti 

^  A(1  -WlBuiu 

-  h  2m 
< 


Note  that  Theorem  (4.6)  also  treats  the  ease  in  which  the  penalty  parameters  are  unbounded. 
In  that  ease,  the  convergence  analysis  is  simplified,  since  definition  (12)  of  the  penalty  parameters 
and  the  boundedness  of  {uk},  {«k}  imply  that  {rfk}  approaches  aero.  If,  on  the  other  hand,  we 
knew  that  the  penalty  parameters  are  bounded,  then  (12)  shows  that  the  statement 


II  **»  —  Vk  II  <  « 
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could  be  added  to  the  remits  of  Theorem  (4.5). 

Most  of  the  technique  in  the  convergence  proof  of  Theorem  (4.6)  is  standard  and  well  known 
from  unconstrained  optimisation  theory.  It  is  repeated  here  for  completeness.  However,  we  must 
be  aware  that 

^r*+,(*k+l)  >  ^r*+1(*k+l) 

is  possible,  implying  that  convergent  penalty  parameters  are  required  to  obtain  a  contradiction 
to  (29). 

The  statements  of  Theorem  (4.6)  can  be  used  to  show  the  approximation  of  a  Kuhn- Tucker 
point  by  Algorithm  (3.1): 

(4.7)  Theorem:  Let  xk,  «*,  d*,  $*,  u*,  B*,  J*  be  computed  by  Algorithm  (3.1)  and  assume  that 
all  assumptions  of  Theorem  (4.6)  are  valid.  Then  there  exists  an  accumulation  point  [x*,  u*)  of 
(xk,  is*)  satisfying  the  Kuhn-Tuclcer  conditions  (3)  for  problem  (1). 

Proofs  The  boundedness  of  {z*},  {u*}  and  the  results  of  Theorem  (4.6)  guarantee  the  existence 
of  z*  €  3tn,  u*  6  Xm,  and  an  infinite  subset  SC  X  with 

lim  Xk  =  x*. 
kes 

lim  Uk  —  u*, 
kes 

lim  dk  =  0,  (35) 

kes 

lim  II  B^{a(u*  —  vk)  ||=  0. 

Since  {$*}  is  bounded  away  from  unity,  (22)  and  (21c, d)  give 

gj(x*)  =  0,  j  =  l,...,m«, 

tAx*)>  °,  j  =  rn*  +  1,...,to, 

showing  that  x*  is  feasible.  From  (21f)  we  get 

u*>0,  y  =  m,  +  l,...,m, 

and  (211)  leads  to 

u*  9 A**)  =  0,  i  =  1, ....  m.  (36) 

It  remains  to  prove  (3a).  Assume  now  there  exists  a  j  >  m,  so  that  j  6  K*k  for  infinitely  many 
*  €  5  (otherwise  we  are  finished).  The  definition  of  K*k,  cf.  (6),  implies  j,(z*)  >  c  and  (36)  gives 
u*  =  0.  We  conclude  from  (21a)  that 


V.L(z*,u*)  =  0. 


I 

The  following  corollary  follows  directly  from  the  statements  of  Theorems  (4.6),  (4.7)  and  from 
(21b). 
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(4.8)  Corollary:  Under  the  assumptions  of  Theorem  (4.7),  let  S  define  an  infinite  inbeet  o/  K  so 
that  (Cki  u*)  conrerge  to  a  Kuhn-Tucker  point  (**,  u*)  of  (1)  for  all  k  6  5.  Then 
a)  Unikes  =  0. 

4)  If,  in  addition,  the  penalty  parameter*  rk  are  bounded,  then 


lim  »k  =  u*. 
ken 

Assumptions  (i)  to  (Hi)  of  Theorem  (4.6)  are  reqnired  to  obtain  descent  directions  for  the 
function  f,.  As  noted  in  the  beginning  of  this  section,  the  boundedness  of  (xk)  and  {dk}  can  be 
enforced  by  introducing  additional  bound  constraints  of  the  type  (23),  and  sufficient  conditions  for 
{t<k}  to  remain  bounded,  are  given  in  [13] .  They  are  mainly  based  on  a  constraint  qualification 
which  must  be  satisfied  in  each  subproblem.  The  assumptions  of  the  convergence  theorems 
presented  so  far  exclude  the  special  case  that  a  search  direction  has  been  obtained  by  (18).  It 
can  be  expected  that  this  replacement  occurs  rarely  if  the  nonlinear  programming  problem  (1) 
satisfies  a  constraint  qualification  at  its  optimal  solution.  If,  on  the  other  hand,  (18)  is  always 
used  to  define  the  new  iterates,  then  (3.1)  could  be  considered  as  a  multiplier  method  and  its 
well-known  convergence  results  can  be  applied. 
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S.  Further  comments 

In  addition  to  the  global  convergence  behavior  outlined  in  the  previous  section,  one  could  be 
interested  in  the  local  convergence  speed  of  Algorithm  (3.1).  The  only  statement  to  show  is,  that 
the  steplength  is  one  in  a  neighbourhood  of  the  solution.  Then  (3.1)  is  identical  with  the  original 
method  of  Han  and  Powell  and  we  can  apply  their  local  superlinear  convergence  results,  cf.  [5] 
or  [11],  respectively.  However,  Algorithm  (3.1)  is  closely  related  to  the  method  presented  in  [13]. 
The  only  difference  influencing  the  local  convergence  analysis  is  a  slightly  simplified  choice  of  the 
penalty  parameters.  Since  both  approaches  are  identical  in  principle,  a  repetition  of  the  local 
convergence  analysis  of  [13]  for  the  presented  modified  case  is  omitted. 

Algorithm  (3.1)  has  been  implemented  in  a  user  oriented  way  and  has  been  tested  extensively. 
The  usage  of  the  program  and  its  FORTRAN  source  are  published  in  [15].  The  numerical  results 
of  [15]  are  obtained  by  executing  the  test  problems  published  in  Hock  and  Schittkowsld  [7],  and 
can  be  compared  with  the  results  given  there.  The  subproblems  of  the  kind  (5)  or  (7),  respectively, 
are  solved  by  the  quadratic  programming  routine  of  Gill,  Murray,  Saunders  and  Wright  [4]  and 
by  a  linear  least-squares  program  based  on  the  subroutines  published  in  Lawson  and  Hanson  [8]. 
Furthermore,  the  Li-penalty  function  has  been  implemented  to  compare  both  approaches,  and 
two  different  line  search  algorithms  are  tested. 

For  further  information  about  the  numerical  performance  of  other  versions  of  Algorithm 
(3.1),  the  reader  is  referred  to  [14],  Five  different  versions  of  the  method  of  Wilson,  Han,  and 
Powell  are  tested  there  which  all  realise  the  active  set  strategy  and  are  based  on  a  least-squares 
formulation  of  the  quadratic  subproblem.  They  differ  in  the  choice  of  the  line  search  function, 
the  formulation  of  the  subproblem,  the  solution  method  for  the  least-squares  subproblem,  and  in 
the  way  in  which  the  gradients  are  computed.  Furthermore,  their  performance  can  be  compared 
with  the  performance  of  the  26  optimisation  programs  tested  in  [12],  and,  in  particular,  with 
the  original  implementation  VF02AD  of  Powell  and  with  OPRQP/XROP,  two  versions  of  Bigg’s 
[1]  recursive  quadratic  programming  method  which  uses  an  active  set  strategy  to  define  equality 
constrained  quadratic  programming  subproblems. 
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Sequential  quadratic  programming  methods  as  developed  by  Wilson,  Han, 
and  Powell  have  gained  considerable  attention  in  the  last  few  years 
mainly  because  of  their  outstanding  numerical  performance.  Although 
the  theoretical  convergence  aspects  of  this  method  and  its  various 
modifications  have  been  investigated  in  the  literature,  there  still 
remain  some  open  questions  which  will  be  treated  in  this  paper.  The 
convergence  theory  to  be  presented,  takes  into  account  the  additional 
variable  introduced  in  the  quadratic  programming  subproblem  to  avoid 
inconsistency,  the  one-dimensional  minimization  procedure,  and,  in 
particular,  an  "active  set"  strategy  to  avoid  the  recalculation  of 
unnecessary  gradients.  This  paper  also  contains  a  detailed 
mathematical  description  of  a  nonlinear  programming  algorithm  which 
has  been  implemented  by  the  author.  The  usage  of  the  code  and 
detailed  numerical  test  results  are  presented  in  (15). 
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